Overview

Dataset statistics

Number of variables10
Number of observations5706
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory445.9 KiB
Average record size in memory80.0 B

Variable types

Numeric10

Alerts

gross_revenue is highly correlated with qtde_invoices and 4 other fieldsHigh correlation
recency_days is highly correlated with df_index and 1 other fieldsHigh correlation
qtde_invoices is highly correlated with gross_revenue and 2 other fieldsHigh correlation
qtde_items is highly correlated with gross_revenue and 4 other fieldsHigh correlation
qtde_products is highly correlated with gross_revenue and 2 other fieldsHigh correlation
frequency is highly correlated with qtde_invoicesHigh correlation
qtde_returns is highly correlated with gross_revenue and 2 other fieldsHigh correlation
avg_ticket is highly correlated with gross_revenue and 2 other fieldsHigh correlation
df_index is highly correlated with customer_id and 1 other fieldsHigh correlation
customer_id is highly correlated with df_index and 1 other fieldsHigh correlation
gross_revenue is highly skewed (γ1 = 21.64886448) Skewed
qtde_items is highly skewed (γ1 = 23.07755455) Skewed
avg_ticket is highly skewed (γ1 = 53.30032708) Skewed
qtde_returns is highly skewed (γ1 = 52.0788511) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
customer_id has unique values Unique
qtde_returns has 4200 (73.6%) zeros Zeros

Reproduction

Analysis started2022-11-07 18:16:14.825968
Analysis finished2022-11-07 18:16:37.841708
Duration23.02 seconds
Software versionpandas-profiling v3.4.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct5706
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2901.328601
Minimum0
Maximum5796
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:38.245610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile290.25
Q11457.25
median2903.5
Q34348.75
95-th percentile5504.75
Maximum5796
Range5796
Interquartile range (IQR)2891.5

Descriptive statistics

Standard deviation1672.098262
Coefficient of variation (CV)0.5763215726
Kurtosis-1.196222843
Mean2901.328601
Median Absolute Deviation (MAD)1446
Skewness-0.00358408334
Sum16554981
Variance2795912.598
MonotonicityStrictly increasing
2022-11-07T19:16:38.479011image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
38521
 
< 0.1%
38721
 
< 0.1%
38711
 
< 0.1%
38701
 
< 0.1%
38691
 
< 0.1%
38681
 
< 0.1%
38671
 
< 0.1%
38661
 
< 0.1%
38651
 
< 0.1%
Other values (5696)5696
99.8%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
57961
< 0.1%
57951
< 0.1%
57941
< 0.1%
57931
< 0.1%
57921
< 0.1%
57911
< 0.1%
57901
< 0.1%
57891
< 0.1%
57881
< 0.1%
57871
< 0.1%

customer_id
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct5706
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16602.89485
Minimum12346
Maximum22709
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:38.651497image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum12346
5-th percentile12697.75
Q114288.25
median16229.5
Q318212.75
95-th percentile21746
Maximum22709
Range10363
Interquartile range (IQR)3924.5

Descriptive statistics

Standard deviation2810.924735
Coefficient of variation (CV)0.1693032909
Kurtosis-0.822889109
Mean16602.89485
Median Absolute Deviation (MAD)1963
Skewness0.4410421788
Sum94736118
Variance7901297.868
MonotonicityNot monotonic
2022-11-07T19:16:38.844969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
178501
 
< 0.1%
179971
 
< 0.1%
178911
 
< 0.1%
164981
 
< 0.1%
137451
 
< 0.1%
155841
 
< 0.1%
210891
 
< 0.1%
210881
 
< 0.1%
210871
 
< 0.1%
210861
 
< 0.1%
Other values (5696)5696
99.8%
ValueCountFrequency (%)
123461
< 0.1%
123471
< 0.1%
123481
< 0.1%
123491
< 0.1%
123501
< 0.1%
123521
< 0.1%
123531
< 0.1%
123541
< 0.1%
123551
< 0.1%
123561
< 0.1%
ValueCountFrequency (%)
227091
< 0.1%
227081
< 0.1%
227071
< 0.1%
227061
< 0.1%
227051
< 0.1%
227041
< 0.1%
227001
< 0.1%
226991
< 0.1%
226961
< 0.1%
226951
< 0.1%

gross_revenue
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5460
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1801.444075
Minimum0.42
Maximum279138.02
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:39.015544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile13.325
Q1236.4925
median612.99
Q31568.89
95-th percentile5312.765
Maximum279138.02
Range279137.6
Interquartile range (IQR)1332.3975

Descriptive statistics

Standard deviation7889.980137
Coefficient of variation (CV)4.379808537
Kurtosis609.3150354
Mean1801.444075
Median Absolute Deviation (MAD)478.52
Skewness21.64886448
Sum10279039.89
Variance62251786.56
MonotonicityNot monotonic
2022-11-07T19:16:39.174071image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7.959
 
0.2%
2.958
 
0.1%
4.958
 
0.1%
1.258
 
0.1%
3.757
 
0.1%
1.657
 
0.1%
12.757
 
0.1%
7.56
 
0.1%
4.256
 
0.1%
5.956
 
0.1%
Other values (5450)5634
98.7%
ValueCountFrequency (%)
0.421
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.844
0.1%
0.853
 
0.1%
1.071
 
< 0.1%
1.258
0.1%
1.441
 
< 0.1%
1.657
0.1%
1.691
 
< 0.1%
ValueCountFrequency (%)
279138.021
< 0.1%
259657.31
< 0.1%
194550.791
< 0.1%
168472.51
< 0.1%
140450.721
< 0.1%
124564.531
< 0.1%
117379.631
< 0.1%
91062.381
< 0.1%
77183.61
< 0.1%
72882.091
< 0.1%

recency_days
Real number (ℝ≥0)

HIGH CORRELATION

Distinct304
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean116.9053628
Minimum0
Maximum373
Zeros38
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:39.350631image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q123
median71
Q3199.75
95-th percentile338
Maximum373
Range373
Interquartile range (IQR)176.75

Descriptive statistics

Standard deviation111.5749925
Coefficient of variation (CV)0.9544043989
Kurtosis-0.6399946632
Mean116.9053628
Median Absolute Deviation (MAD)61
Skewness0.8145762609
Sum667062
Variance12448.97895
MonotonicityNot monotonic
2022-11-07T19:16:39.533091image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1110
 
1.9%
4105
 
1.8%
398
 
1.7%
292
 
1.6%
1086
 
1.5%
882
 
1.4%
979
 
1.4%
1779
 
1.4%
778
 
1.4%
1567
 
1.2%
Other values (294)4830
84.6%
ValueCountFrequency (%)
038
 
0.7%
1110
1.9%
292
1.6%
398
1.7%
4105
1.8%
552
0.9%
778
1.4%
882
1.4%
979
1.4%
1086
1.5%
ValueCountFrequency (%)
37323
0.4%
37223
0.4%
37117
0.3%
3694
 
0.1%
36813
0.2%
36716
0.3%
36615
0.3%
36519
0.3%
36411
0.2%
3627
 
0.1%

qtde_invoices
Real number (ℝ≥0)

HIGH CORRELATION

Distinct56
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.467753242
Minimum1
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:39.718585image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile11
Maximum206
Range205
Interquartile range (IQR)3

Descriptive statistics

Standard deviation6.807257129
Coefficient of variation (CV)1.963016586
Kurtosis302.6109884
Mean3.467753242
Median Absolute Deviation (MAD)0
Skewness13.20390977
Sum19787
Variance46.33874962
MonotonicityNot monotonic
2022-11-07T19:16:39.890161image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12876
50.4%
2830
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
Other values (46)332
 
5.8%
ValueCountFrequency (%)
12876
50.4%
2830
 
14.5%
3504
 
8.8%
4394
 
6.9%
5237
 
4.2%
6173
 
3.0%
7138
 
2.4%
898
 
1.7%
969
 
1.2%
1055
 
1.0%
ValueCountFrequency (%)
2061
< 0.1%
1991
< 0.1%
1241
< 0.1%
971
< 0.1%
912
< 0.1%
861
< 0.1%
721
< 0.1%
622
< 0.1%
601
< 0.1%
571
< 0.1%

qtde_items
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct1841
Distinct (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean977.4258675
Minimum1
Maximum196844
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:40.071624image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q1106
median317.5
Q3804
95-th percentile2939
Maximum196844
Range196843
Interquartile range (IQR)698

Descriptive statistics

Standard deviation4424.85589
Coefficient of variation (CV)4.527050119
Kurtosis786.8375748
Mean977.4258675
Median Absolute Deviation (MAD)253.5
Skewness23.07755455
Sum5577192
Variance19579349.65
MonotonicityNot monotonic
2022-11-07T19:16:40.250176image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1114
 
2.0%
273
 
1.3%
351
 
0.9%
449
 
0.9%
535
 
0.6%
629
 
0.5%
1225
 
0.4%
8822
 
0.4%
7221
 
0.4%
720
 
0.4%
Other values (1831)5267
92.3%
ValueCountFrequency (%)
1114
2.0%
273
1.3%
351
0.9%
449
0.9%
535
 
0.6%
629
 
0.5%
720
 
0.4%
818
 
0.3%
97
 
0.1%
1017
 
0.3%
ValueCountFrequency (%)
1968441
< 0.1%
809971
< 0.1%
802631
< 0.1%
773731
< 0.1%
742151
< 0.1%
699931
< 0.1%
645491
< 0.1%
641241
< 0.1%
633121
< 0.1%
583431
< 0.1%

qtde_products
Real number (ℝ≥0)

HIGH CORRELATION

Distinct529
Distinct (%)9.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean92.51962846
Minimum1
Maximum7838
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:40.443605image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q114
median41
Q3106
95-th percentile331.75
Maximum7838
Range7837
Interquartile range (IQR)92

Descriptive statistics

Standard deviation210.3911487
Coefficient of variation (CV)2.27401636
Kurtosis511.1621495
Mean92.51962846
Median Absolute Deviation (MAD)33
Skewness17.7677891
Sum527917
Variance44264.43546
MonotonicityNot monotonic
2022-11-07T19:16:40.621123image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1256
 
4.5%
2149
 
2.6%
3109
 
1.9%
10101
 
1.8%
699
 
1.7%
992
 
1.6%
591
 
1.6%
487
 
1.5%
1184
 
1.5%
783
 
1.5%
Other values (519)4555
79.8%
ValueCountFrequency (%)
1256
4.5%
2149
2.6%
3109
1.9%
487
 
1.5%
591
 
1.6%
699
 
1.7%
783
 
1.5%
881
 
1.4%
992
 
1.6%
10101
 
1.8%
ValueCountFrequency (%)
78381
< 0.1%
56731
< 0.1%
50951
< 0.1%
45801
< 0.1%
26981
< 0.1%
23791
< 0.1%
20601
< 0.1%
18181
< 0.1%
16731
< 0.1%
16371
< 0.1%

avg_ticket
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct5512
Distinct (%)96.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.52923839
Minimum0.42
Maximum77183.6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:40.806617image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile3.4625
Q17.95
median15.86033158
Q321.97690213
95-th percentile76.32
Maximum77183.6
Range77183.18
Interquartile range (IQR)14.02690213

Descriptive statistics

Standard deviation1280.323898
Coefficient of variation (CV)23.47958519
Kurtosis2956.973403
Mean54.52923839
Median Absolute Deviation (MAD)7.496531367
Skewness53.30032708
Sum311143.8343
Variance1639229.285
MonotonicityNot monotonic
2022-11-07T19:16:40.974199image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.7511
 
0.2%
4.9510
 
0.2%
1.259
 
0.2%
2.959
 
0.2%
7.958
 
0.1%
8.257
 
0.1%
1.657
 
0.1%
12.757
 
0.1%
3.356
 
0.1%
4.156
 
0.1%
Other values (5502)5626
98.6%
ValueCountFrequency (%)
0.423
0.1%
0.5351
 
< 0.1%
0.651
 
< 0.1%
0.791
 
< 0.1%
0.83714285711
 
< 0.1%
0.842
< 0.1%
0.853
0.1%
1.0022222221
 
< 0.1%
1.021
 
< 0.1%
1.038751
 
< 0.1%
ValueCountFrequency (%)
77183.61
< 0.1%
56157.51
< 0.1%
13305.51
< 0.1%
4453.431
< 0.1%
38611
< 0.1%
3202.921
< 0.1%
30961
< 0.1%
1687.21
< 0.1%
1377.0777781
< 0.1%
1001.21
< 0.1%

frequency
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1226
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5474958329
Minimum0.005449591281
Maximum17
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:41.167629image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.005449591281
5-th percentile0.01103448976
Q10.02494806094
median1
Q31
95-th percentile1
Maximum17
Range16.99455041
Interquartile range (IQR)0.9750519391

Descriptive statistics

Standard deviation0.550438394
Coefficient of variation (CV)1.005374582
Kurtosis138.6786738
Mean0.5474958329
Median Absolute Deviation (MAD)0
Skewness4.846392144
Sum3124.011223
Variance0.3029824256
MonotonicityNot monotonic
2022-11-07T19:16:41.339703image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12884
50.5%
248
 
0.8%
0.062518
 
0.3%
0.0277777777817
 
0.3%
0.0238095238116
 
0.3%
0.0833333333315
 
0.3%
0.0909090909115
 
0.3%
0.0344827586215
 
0.3%
0.0294117647114
 
0.2%
0.0769230769213
 
0.2%
Other values (1216)2651
46.5%
ValueCountFrequency (%)
0.0054495912811
 
< 0.1%
0.0054644808741
 
< 0.1%
0.0054794520551
 
< 0.1%
0.0054945054951
 
< 0.1%
0.0055865921792
< 0.1%
0.0056022408961
 
< 0.1%
0.0056179775282
< 0.1%
0.005665722381
 
< 0.1%
0.0056818181822
< 0.1%
0.0056980056983
0.1%
ValueCountFrequency (%)
171
 
< 0.1%
41
 
< 0.1%
35
 
0.1%
248
 
0.8%
1.1428571431
 
< 0.1%
12884
50.5%
0.751
 
< 0.1%
0.66666666673
 
0.1%
0.5508021391
 
< 0.1%
0.53351206431
 
< 0.1%

qtde_returns
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct215
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.40308447
Minimum0
Maximum80995
Zeros4200
Zeros (%)73.6%
Negative0
Negative (%)0.0%
Memory size44.7 KiB
2022-11-07T19:16:41.505209image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile38.75
Maximum80995
Range80995
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1468.173455
Coefficient of variation (CV)32.33642542
Kurtosis2761.915902
Mean45.40308447
Median Absolute Deviation (MAD)0
Skewness52.0788511
Sum259070
Variance2155533.293
MonotonicityNot monotonic
2022-11-07T19:16:41.659832image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
678
 
1.4%
561
 
1.1%
1252
 
0.9%
744
 
0.8%
843
 
0.8%
Other values (205)714
 
12.5%
ValueCountFrequency (%)
04200
73.6%
1169
 
3.0%
2151
 
2.6%
3105
 
1.8%
489
 
1.6%
561
 
1.1%
678
 
1.4%
744
 
0.8%
843
 
0.8%
941
 
0.7%
ValueCountFrequency (%)
809951
< 0.1%
742151
< 0.1%
90141
< 0.1%
80041
< 0.1%
44271
< 0.1%
37681
< 0.1%
33321
< 0.1%
28781
< 0.1%
20221
< 0.1%
20121
< 0.1%

Interactions

2022-11-07T19:16:35.223662image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:19.553281image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:21.690260image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.581099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:25.102214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.875805image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.360205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:30.097711image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.968682image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.636226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:35.384246image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:19.714186image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:21.914648image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.721716image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:25.270796image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.027395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.530738image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:30.277739image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.143214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.790763image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:35.530713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:19.857795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:22.100142image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.864511image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:25.568623image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.168047image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.697352image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:30.438353image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.312800image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.944343image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:35.707234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:20.023688image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:22.298600image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.004948image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:25.726192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.305083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.862538image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:30.617024image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.473402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.092981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:35.878835image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:20.220150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:22.543931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.150738image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:25.888789image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.459664image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.027050image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:30.773599image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.634919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.244567image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:36.040395image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:20.571316image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:22.718454image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.287363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.041332image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.590347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.174225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.087643image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.789496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.383417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:36.274987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:20.796700image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:22.880013image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.461895image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.207881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.739938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.359119image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.256188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:32.971997image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.546931image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:36.509346image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:20.990181image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.057528image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.634377image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.372468image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:27.889490image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.550205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.428721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.146562image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.712522image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:36.737724image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:21.221541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.224075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.800965image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.531866image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.072987image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.733702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.604242image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.315061image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:34.871343image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:36.952137image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:21.438948image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:23.401589image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:24.950714image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:26.698331image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:28.213606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:29.936150image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:31.782105image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:33.470635image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-11-07T19:16:35.041919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-11-07T19:16:41.806428image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2022-11-07T19:16:42.004888image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-11-07T19:16:42.212279image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-11-07T19:16:42.445644image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-11-07T19:16:42.669073image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-11-07T19:16:37.303180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-11-07T19:16:37.666188image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketfrequencyqtde_returns
00178505391.21372.034.01733.0297.018.15222217.00000040.0
11130473232.5956.09.01390.0171.018.9040350.02830235.0
22125836705.382.015.05028.0232.028.9025000.04032350.0
3313748948.2595.05.0439.028.033.8660710.0179210.0
4415100876.00333.03.080.03.0292.0000000.07317122.0
55152914623.3025.014.02102.0102.045.3264710.04011529.0
66146885630.877.021.03621.0327.017.2197860.057221399.0
77178095411.9116.012.02057.061.088.7198360.03352041.0
881531160767.900.091.038194.02379.025.5434640.243316474.0
99160982005.6387.07.0613.067.029.9347760.0243900.0

Last rows

df_indexcustomer_idgross_revenuerecency_daysqtde_invoicesqtde_itemsqtde_productsavg_ticketfrequencyqtde_returns
56965787227004839.421.01.01074.062.078.0551611.00.0
5697578813298360.001.01.096.02.0180.0000001.00.0
5698578914569227.391.01.079.012.018.9491671.00.0
569957902270417.901.01.014.07.02.5571431.00.0
57005791227053.351.01.02.02.01.6750001.00.0
57015792227065699.001.01.01747.0634.08.9889591.00.0
57025793227076756.060.01.02010.0730.09.2548771.00.0
57035794227083217.200.01.0654.059.054.5288141.00.0
57045795227093950.720.01.0731.0217.018.2060831.00.0
5705579612713794.550.01.0505.037.021.4743241.00.0